19 research outputs found
Recommended from our members
Sociolinguistically Driven Approaches for Just Natural Language Processing
Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed they can be harmful; NLP systems reproduce stereotypes, prevent speakers of non-standard language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this thesis, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms?
We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities
"One-size-fits-all"? Observations and Expectations of NLG Systems Across Identity-Related Language Features
Fairness-related assumptions about what constitutes appropriate NLG system
behaviors range from invariance, where systems are expected to respond
identically to social groups, to adaptation, where responses should instead
vary across them. We design and conduct five case studies, in which we perturb
different types of identity-related language features (names, roles, locations,
dialect, and style) in NLG system inputs to illuminate tensions around
invariance and adaptation. We outline people's expectations of system
behaviors, and surface potential caveats of these two contrasting yet
commonly-held assumptions. We find that motivations for adaptation include
social norms, cultural differences, feature-specific information, and
accommodation; motivations for invariance include perspectives that favor
prescriptivism, view adaptation as unnecessary or too difficult for NLG systems
to do appropriately, and are wary of false assumptions. Our findings highlight
open challenges around defining what constitutes fair NLG system behavior.Comment: 36 pages, 24 figure
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
There are many ways to express similar things in text, which makes evaluating
natural language generation (NLG) systems difficult. Compounding this
difficulty is the need to assess varying quality criteria depending on the
deployment setting. While the landscape of NLG evaluation has been well-mapped,
practitioners' goals, assumptions, and constraints -- which inform decisions
about what, when, and how to evaluate -- are often partially or implicitly
stated, or not stated at all. Combining a formative semi-structured interview
study of NLG practitioners (N=18) with a survey study of a broader sample of
practitioners (N=61), we surface goals, community practices, assumptions, and
constraints that shape NLG evaluations, examining their implications and how
they embody ethical considerations.Comment: Camera Ready for NAACL 2022 (Main Conference
How to write a bias statement:Recommendations for submissions to the Workshop on Gender Bias in NLP
At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage
authors to give explicit consideration to the wider aspects of bias and its
social implications. For the 2020 edition of the workshop, we therefore
requested that all authors include an explicit bias statement in their work to
clarify how their work relates to the social context in which NLP systems are
used.
The programme committee of the workshops included a number of reviewers with
a background in the humanities and social sciences, in addition to NLP experts
doing the bulk of the reviewing. Each paper was assigned one of those
reviewers, and they were asked to pay specific attention to the provided bias
statements in their reviews. This initiative was well received by the authors
who submitted papers to the workshop, several of whom said they received useful
suggestions and literature hints from the bias reviewers. We are therefore
planning to keep this feature of the review process in future editions of the
workshop.Comment: This document was originally published as a blog post on the web site
of GeBNLP 202
Taxonomizing and Measuring Representational Harms: A Look at Image Tagging
In this paper, we examine computational approaches for measuring the "fairness" of image tagging systems, finding that they cluster into five distinct categories, each with its own analytic foundation. We also identify a range of normative concerns that are often collapsed under the terms "unfairness," "bias," or even "discrimination" when discussing problematic cases of image tagging. Specifically, we identify four types of representational harms that can be caused by image tagging systems, providing concrete examples of each. We then consider how different computational measurement approaches map to each of these types, demonstrating that there is not a one-to-one mapping. Our findings emphasize that no single measurement approach will be definitive and that it is not possible to infer from the use of a particular measurement approach which type of harm was intended to be measured. Lastly, equipped with this more granular understanding of the types of representational harms that can be caused by image tagging systems, we show that attempts to mitigate some of these types of harms may be in tension with one another